Scheduling Data Intensive Particle Physics Analysis Jobs on Clusters of PCs

نویسنده

  • S. Ponce
چکیده

Scheduling policies are proposed for parallelizing data intensive particle physics analysis applications on computer clusters. Particle physics analysis jobs require the analysis of tens of thousands of particle collision events, each event requiring typically 200ms processing time and 600KB of data. Many jobs are launched concurrently by a large number of physicists. At a first view, particle physics jobs seem to be easy to parallelize, since particle collision events can be processed independently one from another. However, since large amounts of data need to be accessed, the real challenge resides in making an efficient use of the underlying computing resources. We propose several job parallelization and scheduling policies aiming at reducing job processing times and at increasing the sustainable load of a cluster server. The complexity of each policy is analysed as a measure of the scalability of the system. Since particle collision events are usually reused by several jobs, cache based job splitting strategies considerably increase cluster utilisation and reduce job processing times. Compared with straightforward job scheduling on a processing farm, cache based first in first out job splitting speeds up average response times by an order of magnitude and reduces job waiting times in the system’s queues from hours to minutes. By scheduling the jobs out of order, according to the availability of their collision events in the node disk caches, response times are further reduced, especially at high loads. In the delayed scheduling policy, job requests are accumulated during a time period, divided into subjob requests according to a parameterizable subjob size, and scheduled at the beginning of the next time period according to the availability of their data segments within the disk node caches. Delayed scheduling sustains a load close to the maximal theoretically sustainable load of a cluster, but at the cost of longer average response times. We also propose an adaptive delay scheduling approach, where the scheduling delay is adapted to the current load. This last scheduling approach sustains very high loads and offers low response times at normal loads. We analyse the benefits of pipelining computation and accesses to tertiary storage and to the local disk caches. Pipelining tends to increase the throughput of jobs and allows the system to sustain higher loads. Finally we analyse the complexity of the different scheduling algorithms both in terms of space and time. The system is highly scalable and supports a cluster of up to several tens of thousands of nodes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Issues in Petabyte Data Indexing, Retrieval and Analysis

We propose several methods for speeding up the processing of particle physics data on clusters of PCs. We present a new way of indexing and retrieving data in a high dimensional space by making use of two levels of catalogues enabling an efficient data preselection. We propose several scheduling policies for parallelizing data intensive particle physics applications on clusters of PCs. We show ...

متن کامل

A New Job Scheduling in Data Grid Environment Based on Data and Computational Resource Availability

Data Grid is an infrastructure that controls huge amount of data files, and provides intensive computational resources across geographically distributed collaboration. The heterogeneity and geographic dispersion of grid resources and applications place some complex problems such as job scheduling. Most existing scheduling algorithms in Grids only focus on one kind of Grid jobs which can be data...

متن کامل

Solving the Problem of Scheduling Unrelated Parallel Machines with Limited Access to Jobs

Nowadays, by successful application of on time production concept in other concepts like production management and storage, the need to complete the processing of jobs in their delivery time is considered a key issue in industrial environments. Unrelated parallel machines scheduling is a general mood of classic problems of parallel machines. In some of the applications of unrelated parallel mac...

متن کامل

Solving the Problem of Scheduling Unrelated Parallel Machines with Limited Access to Jobs

Nowadays, by successful application of on time production concept in other concepts like production management and storage, the need to complete the processing of jobs in their delivery time is considered a key issue in industrial environments. Unrelated parallel machines scheduling is a general mood of classic problems of parallel machines. In some of the applications of unrelated parallel mac...

متن کامل

Data Intensive High Energy Physics Analysis in a Distributed Cloud

We show that distributed Infrastructure-as-aService (IaaS) compute clouds can be effectively used for the analysis of high energy physics data. We have designed a distributed cloud system that works with any application using large input data sets requiring a high throughput computing environment. The system uses IaaS-enabled science and commercial clusters in Canada and the United States. We d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005